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Complex networks have been successfully employed to represent different levels of biological 
systems, ranging from gene regulation to protein -protein interactions and metabolism. 
Network-based research has mainly focused on identifying unifying structural properties, 
such as small average path length, large clustering coefficient, heavy-tail degree distribution 
and hierarchical organization, viewed as requirements for efficient and robust system architect- 
ures. However, for biological networks, it is unclear to what extent these properties refiect the 
evolutionary history of the represented systems. Here, we show that the salient structural prop- 
erties of six metabolic networks from all kingdoms of life may be inherently related to the 
evolution and functional organization of metabolism by employing network randomization 
under mass balance constraints. Contrary to the results from the common Markov- chain 
switching algorithm, our findings suggest the evolutionary importance of the small-world 
hypothesis as a fundamental design principle of complex networks. The approach may help 
us to determine the biologically meaningful properties that result from evolutionary pressure 
imposed on metabolism, such as the global impact of local reaction knockouts. Moreover, 
the approach can be applied to test to what extent novel structural properties can be used to 
draw biologically meaningful hypothesis or predictions from structure alone. 
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1. INTRODUCTION 

The central findings in network-based research suggest 
that there exist simple mechanisms directing the evol- 
ution of both engineered and natural networks [1-10]. 
However, the relation between the functions of a bio- 
logical system and its network properties is hardly 
understood. Therefore, the advantage of using network 
representations for positing meaningful hypotheses 
about biological systems remains largely debatable [11]. 

Properties of biological systems arise from two fun- 
damental origins: physical principles^ universally 
constraining the feasibility of biochemical processes, 
and evolutionary pressure^ bearing the specific func- 
tional abilities required for an organism's vitality [12]. 
The former comprise well-understood physical laws, 
such as mass balance and thermodynamics, which 
constitute the basic requirements imposed on all living 
systems. In contrast, evolution depends on the interplay 
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of complex phenomena, such as adaptation to environ- 
mental changes, symbiosis and biodiversity of 
populations [13,14], leading to diverse cellular func- 
tions. Consequently, the unique properties related to 
the functions of a biological system are a result of its 
evolutionary history. 

Explaining cellular behaviour through network rep- 
resentations and their properties is a key challenge of 
modern biology. While many structural properties of 
metabolic networks are similar to those of other complex 
networks [10], it is unclear whether they are a conse- 
quence of the evolutionary history or merely arise as a 
result of general physical principles. Here, we apply a ran- 
domization method to determine which properties of 
metabolic networks, represented as bipartite metabolite- 
reaction graphs, may result from evolutionary pressure. 
This is an essential step in understanding the relation 
between the functional characteristics of biological systems 
and their network representations. 

The common approach for estimating the relevance 
of a network property is to determine the statistical 
significance (p- value) by comparing the value of 
the property in the investigated network with those in 
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the null-model distribution obtained from randomized 
networks [15]. Clearly, the significance of a property 
strongly depends on the chosen null model, which 
should be constrained to preserve universal network 
properties [16,17]. Since the p- value is the probability 
that the value of a property originates from the null- 
model distribution, a statistically significant property 
is likely to have emerged from some non-arbitrary pro- 
cess infiuencing network evolution independently of 
the imposed constraints. 

In virtually all network-based studies [18-24], a 
Markov-chain switching algorithm, switch randomiz- 
ation^ has been employed to determine the 
significance of network properties by generating ran- 
domized networks with preserved degree sequence. Its 
motivation stems from the finding that heavy-tail 
degree distributions are a universal feature of complex 
networks. This generic null model can be applied to 
any type of network, and guarantees the independence 
of an identified property from vertex degrees. We 
demonstrate how switch randomization affects the 
citric acid (TCA) cycle, a central respiratory metaboHc 
pathway of outstanding importance for aerobic organ- 
isms (figure la): two reactions substratci ^ product i 
and substrate2 product2 are substituted with new 
reactions substratci product2 and substrate2 
product 1, ensuring that the vertex degrees remain 
unchanged (figure lb). Since chemical feasibility is dis- 
regarded, a reaction that converts a-ketoglutarate into 
succinyl-CoA may be generated, where several atoms 
are created out of nothing. Hence, it remains hypothe- 
tical to what extent the properties, identified as 
significant with this method, relate to the function of 
the network, as they could well result from universal 
physical constraints imposed during network evolution. 

2. RESULTS 

2.1. Measuring evolutionary significance 

To identify the properties that originate from evolution- 
ary pressure, a network should be compared with 
random networks that evolved free of evolutionary 
pressure, but persistently satisfy all relevant physical 
constraints. As this is practically impossible to simulate, 
we apply our recent method for randomizing metabolic 
networks while preserving mass balance of the bio- 
chemical reactions [25]. A reaction rwith substrate set 
S and product set P is mass balanced if the number of 
substrate atoms equals the number of product atoms: 

tt5,r • ^5 = ^ cip,r ' rrip. (2.1) 

where m^, are the vectors of sum formulas of s and p, 
respectively, and a^^-^, ap^r their stoichiometric coef- 
ficients (see §4). The mass-balanced randomization of 
the TCA cycle does not violate this basic physical 
constraint, as shown in figure Ic. 

Thermodynamic properties, refiecting the energy 
change of reactions, constitute another important phys- 
ical requirement for metabolic networks. As shown in 
figure 2, the reactions generated by mass-balanced ran- 
domization of the Escherichia coli network are 



characterized by plausible Gibbs-free energy changes 
under standard conditions (pH = 7, T= 298.15 K, see 
the electronic supplementary material) [26]. In con- 
trast, switch randomization results in unrealistic 
energy ranges. By preserving mass balance and thermo- 
dynamic properties during randomization, our null 
model imposes realistic physical constraints on the gen- 
erated randomized networks. This ensures that the 
significant properties are independent of the fundamen- 
tal physical requirements, and instead are likely to 
result from evolutionary pressure. Therefore, we refer 
to the statistically significant properties under the pro- 
posed null model as evolutionary significant. 

For illustration, consider a landscape formed by the 
values of any given property over all randomized net- 
works (figure 3). The constrained networks, obtained 
by mass-balanced randomization, carve out a region in 
the vicinity of the original network that is embedded in 
the region of unconstrained networks resulting from 
switch randomization. As these regions exhibit different 
distributions of values, illustrated by different magni- 
tudes of the peaks, an evolutionary significant property 
may only be identified when comparing the property of 
the original network with the constrained region. 

2.2. Biosynthetic capabilities 

To verify our approach, first we determine the 
evolutionary significance of the scope size distribution 
in the genome-scale metabolic networks of six 
model organisms: Bacillus subtilis, E. coli (bacteria), 
Saccharomyces cerevisiae (fungi), Chlamydomonas 
reinhardtii (protista), Arabidopsis thaliana (plantae) 
and Homo sapiens (animaha; see §4). The scope rep- 
resents the set of compounds that can be produced in 
a metabolic network from a given set of initial nutrients 
[27]. We determine the scope size distribution of each 
network by repeatedly calculating the scope for 5000 
randomly chosen sets of nutrient compounds, one set 
at a time, according to the following procedure: (i) 
from the initial set of nutrients, determine the reactions 
for which all substrates are contained in the nutrient 
set; (ii) add the products of these reactions; and (iii) 
repeat the procedure, until no more products can be 
added (see electronic supplementary material, §1.3). 

The scope size distribution characterizes the biosyn- 
thetic capability of a network and has been shown 
to exhibit a strong correlation with the evolutionary 
history of organisms [28,29]. After applying mass- 
balanced randomization to the six networks, we 
compare the scope size distributions of each organism 
and its randomized network ensemble, and determine 
p- values using the Kolmogorov-Smirnov test (see §4). 
We find the scope size distributions to be evolutionary 
significant for all studied organisms (p- values <10~^^; 
electronic supplementary material, table S4 and figure 
SI), which demonstrates that our method correctly 
identifies the interdependence of the network property 
and its evolutionary background. 

2.3. Small- world property 

In the following, we focus on determining the evolution- 
ary significance of salient network properties that have 
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Figure 1. Illustration of how switch and mass-balanced randomization of the genome-scale metabolic network of Escherichia coli 
affect the TCA cycle, (a) The TCA cycle in Escherichia coli, consisting of eight reactions and 22 compounds. Compound names 
are shown with corresponding sum formulas, irreversible reactions are represented by solid squares and reversible reactions are 
denoted by blank squares. Internally, a reversible reaction is represented by one vertex for each direction, in order to adequately 
model the substrate -product relationships (see §4). (b) Reactions involving metabolites from the TCA cycle (bold arrows and 
names) after applying switch randomization. The degrees of compounds and reactions are preserved, but the generated reactions 
violate fundamental physical constraints (see inlay). Note that the reactions shown are obtained from randomization of the 
entire network of Escherichia coli; the degrees therefore do not correspond to those shown in (a), (c) All reactions obtained 
by mass-balanced randomization are chemically feasible owing to balanced atom masses and realistic thermodynamic energy 
ranges, as indicated by the sum formulas and stoichiometric coefficients (thermodynamic data not shown). 



been extensively studied in complex network research 
and are prominently applied in biological studies. In par- 
ticular, we analyse the small- world property [30], defined 
by a large clustering coefficient in conjunction with 
small average path length, and the metabolite degree 
distribution [8] (electronic supplementary material, 
table S2). We find that the clustering coefficient is sig- 
nificant in all species (j?- values < 10~^), regardless of 
the applied null model. On the other hand, the average 
path length is evolutionary significant with p- values < 
0.025 in all species (electronic supplementary 
material, table S4). With switch randomization, this 
property is significant ( p- values < 10 ~^) in all but 
S. cerevisiae (p-value =0.77; electronic supplementary 
material, table S5). 



More importantly, we may now assess the impor- 
tance of the small-world phenomenon by determining 
whether this property is more pronounced in the 
analysed networks when compared with their random- 
ized variants. Interestingly, in each species we find that 
the average path length is smaller and the clustering 
coefficient is greater than the values of the respective 
properties obtained from mass-balanced randomization 
(figure 4). This finding indicates that the small- 
world property is independent of physical constraints, 
and thus likely to be of evolutionary importance for 
metabolic networks. By contrast, when comparing 
the networks with their switch randomized ensem- 
bles, we arrive at a contrary conclusion — larger 
average path lengths and smaller clustering coefficients 



J. R. Soc. Interface (2012) 



Significant metabolic network properties G. Easier et al. 1171 



0.1 - 



o 



10- 



10- 




10-4 - 



10 




-5 U 

-8000 



8000 



Figure 2. Distributions of Gibbs-free energy changes under standard conditions (A^C*^ ) in Escherichia coli (black), and averaged 
over 10^ mass-balanced (blue) and switch randomized (red) networks. Energy changes in E. coli have a mean of 7.5 and standard 
deviation 15.1; mass-balanced randomized networks have a similar mean of 6.5 and standard deviation 53.5. In contrast, switch 
randomization generates implausible energy ranges with a mean of 32.5 and standard deviation 847.3. 



are prominent in real- world metabolic networks. 
Therefore, the results from switch randomization 
suggest that metaboHc networks are the opposite of 
small worlds, disproving the small- world hypothesis. 
Moreover, this finding hints at two major hazards 
of network null models: (i) the results obtained cru- 
cially depend on the model chosen, and (ii) the 
application of a generic null model that provides an 
unrealistically constrained environment may lead to 
counterintuitive results. 



2.4' Degree distributions 

Next, we analyse the metabolite degree distributions, 
where the degree of a metabolite is the number of reac- 
tions it is involved in either as a substrate or a product. 
The degree can be interpreted as metabolite specificity, 
with highly specific metabolites occurring in only 
few reactions. To our knowledge, the significance of 
degree distributions was never studied, since switch 
randomization is unsuited for this task. The degree 
distributions of all six organisms are evolutionary sig- 
nificant ( p- values < 10"^''; electronic supplementary 
material, table S4 and figure S3), suggesting that the 
patterns of metabolite specificities across different 
organisms emerge as a consequence of their evolution- 
ary history, and not from the imposed physical 
constraints. This finding complements the well-known 
evolutionary requirement of a network architecture 



that is robust to random errors, as exhibited by the 
heavy-tail degree distributions [8]. 

2,5, Reaction centrality 

Finally, we propose a measure for determining the 
global importance of individual reactions, which is 
based on a centrality index previously used in sociologi- 
cal studies [31] (also referred to as Hubbell Index). For 
two reactions and rj , we define the dependence of rj 
on ri as the largest ratio by which contributes to the 
overall production of an intermediary c (i.e. a com- 
pound that is produced by and consumed by rj ): co 
) = maxc(iin(c)~"^, where c^n(c) is the in- degree 
of c, which is the total number of reactions producing 
c. Note that this definition corresponds to the strength 
of impact of a knockout of on rj , where co {ri, rj) = 1, 
if rj becomes inoperable upon knockout of (e.g. if 
and rj are neighbours in a linear chain of reactions), 
and 6D(r^, rj ) ~ 0, if the intermediaries required by rj 
can be produced by many other reactions in the 
network. 

The global impact of the knockout of a reaction on 
the entire network, which we call reaction centrality, is 

i^(n) = ^ v(r,)-o;(n,r,), (2.2) 

where R is the set of all reactions in the network, and 
a)(r^, rj) = 0, if and rj do not share any intermediary 
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Figure 3. Illustration of a landscape of property values over all 
randomized networks. The property of the original network 
stands out from the inner region of constrained networks, 
but becomes inconspicuous in the outer region of uncon- 
strained networks. Therefore, only by comparison with the 
constrained networks, one may detect the evolutionary signifi- 
cance of the property. 



compound (i.e. and are not directly connected). 
This measure accounts for the direct dependencies 
between reactions through their intermediary com- 
pounds, as well as the global importance of the 
affected reactions: a knockout may affect only few 
other reactions directly, but can still have a large 
impact on the network, if an important reaction is 
affected indirectly (e.g. the knockout of a reaction at 
the beginning of a linear chain that leads to a reaction 
producing many important compounds). 

Equation (2.2) can be written in matrix form as 
Av = v, where Aij= o){ri, rj). In order to solve this 
eigenvalue problem, we need to ensure that the inverse 
of A exists, which can be achieved by the PageRank 
transformation [32]. In particular, the transformed 
matrix A^ is obtained by normalizing the columns of 
A and applying a damping factor d: 



A' ■ 



d-A, 



E,A. + a-^)/i^i' 



which yields the Markov chain represented by A' 
ergodic, as the corresponding network is strongly con- 
nected, and ensures that the largest eigenvalue is 
1. In order to minimize the diluting effect of the damp- 
ing factor on the topology of A^ we choose d=0.99. 
The eigenvector v corresponding to the eigenvalue 1 
of A' then contains the global centrality values of the 
reactions in the network, where v{i) corresponds to 
the reaction centrality of the ith reaction. The calcu- 
lation for large networks is tractable using a Fortran 
implementation of the Implicitly Restarted Arnoldi 
Method [33]. 

We determine a p-value for each reaction by compar- 
ing its centrality value in the original network with 
those obtained from mass-balanced randomized net- 
works while preserving the reaction itself. In order to 
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Figure 4. Characteristic path lengths {L) and clustering coeffi- 
cients ((7) of the six investigated metabolic networks (black 
dots) and averaged values of their mass-balanced (blue tri- 
angles) and switch randomized (red crosses) ensembles. 
Compared with the mass-balanced null model, characteristic 
path lengths are small and clustering coefficients large in all 
six organisms, confirming the small world hypothesis. Con- 
trarily, in comparison to the switch based null model, 
characteristic path lengths are large and clustering coeffi- 
cients small. The standard deviation is below 0.02 for each 
randomized distribution. 



estimate the effect of evolutionary pressure towards 
high centrality values, we apply a one-sided test with 
the null hypothesis that the values obtained from ran- 
domization are at least as large as the values of the 
original reactions (see §4). 

Table 1 shows the reactions that have a significant 
centrality (;?- value < 0.025) in at least three of the ana- 
lysed species (see the electronic supplementary material, 
table S7 for a complete list). The references provide evi- 
dence that each reaction is of outstanding importance 
for metabolism, as demonstrated by their evolutionary 
ubiquity, severity of knockout or inhibition effects, and 
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Table 1. Enzymes catalysing the reactions with highly significant centrality across species. All reactions with centrality 
p- values < 0.025 in at least three of the following species: Bacillus subtilis (BS), Escherichia coli (EC), Sac char omyces 
cerevisiae (SC), Chlamydomonas reinhardtii (CR), Arabidopsis thaliana (AT) and Homo sapiens (HS). A checkmark indicates 
that the reaction catalysed by the enzyme has a significant centrality in the corresponding species; a hyphen indicates not 
significant; n.a. indicates the corresponding enzyme is not annotated for the species. 



enzyme 


EC no. 


BS 


EC 


SC 


CR 


AT 


HS 


references 


catalase 


1.11.1.6 


/ 


/ 


/ 




/ 


n.a. 


[34 


,35] 


superoxide dismutase 


1.15.1.1 


/ 


/ 


n.a. 




/ 


/ 


[36- 


-38] 


carbonic anhydrase 


4.2.1.1 


/ 




/ 


/ 


/ 


n.a. 


[39- 


-45] 


L-arabinose isomerase 


5.3.1.4 


/ 


/ 


n.a. 


n.a. 


n.a. 


/ 


[46 


,47] 


phosphoglycerate mutase 


5.4.2.1 


/ 




/ 






/ 


[48- 


-51] 



clinical applications. For instance, catalase (EC 1.11.1.6) 
inactivation was shown to have severe effects on the life- 
span of S. cerevisiae cells [34]. Superoxide dismutase 
(EC 1.15.1.1) is essential for defense against oxygen 
toxicity and aerobic growth in eukaryotes [36,37], and 
is involved in a multitude of diseases [38]. Carbonic 
anhydrase (EC 4.2.1.1) fulfil diverse metabolic functions 
in organelles, tissues and membranes of virtually all 
species, is used as a drug target for various diseases 
and is one of the evolutionary oldest enzymes [39-45]. 
The numerous experimental corroborations suggest 
that the proposed centrality index, in conjunction with 
the evolutionary significance determined by using our 
null model, could be used to predict enzymes responsible 
for maintaining organismal viability solely from the 
network structure. 

For comparison, when repeating the analysis using 
switch randomization, the picture is less clear. In 
S. cerevisiae, A. thaliana and H. sapiens, 89, 27 
and 14 per cent of the reactions have a p-value of 
0.0099, rendering the analysis useless at least for the 
first two species. Five reactions have a significant cen- 
trality in at least two of the remaining three analysed 
species (electronic supplementary material, tables S6 
and S8). We omit a detailed statistical analysis of 
these initial results, which will be necessary to draw 
further conclusions. 

3. DISCUSSION 

To conclude, we proposed a novel method to reveal the 
relation between network properties and their evol- 
utionary background by preserving the universal 
physical principles that constrain the design of meta- 
bolic networks. Any property that originates from 
evolutionary pressure, and thus relates to an important 
biological function, should not be observed in artificial 
metabolic networks, which evolved free of evolutionary 
pressure, but satisfy all relevant physical constraints. 
This should even hold for properties evolved from com- 
plex time-dependent phenomena, if they are reflected in 
the ultimately observed network. 

We recognize that the proposed method only preserves 
mass balance and thermodynamic constraints, while 
other physical principles, such as electric charges, may 
also be relevant for metabolic network properties. Never- 
theless, the considered physical constraints are the most 
fundamental and ubiquitous ones. Therefore, we believe 
that the method is a reasonable first approach to extract 



the biological importance of metabolic network proper- 
ties. Accounting for additional physical constraints is 
complicated by the lack of reliable data for genome-scale 
metabolic networks; however, we expect such extensions 
to become possible in the future, which should further 
improve the biological relevance of the significance 
measure and the accuracy of the resulting predictions. 

In contrast to the commonly applied switch rando- 
mization, our approach provides a realistic network 
background, and attributes an important evolutionary 
role to the small- world property and heavy-tail degree dis- 
tributions. Our findings shed new light on the conclusions 
of previous studies, and suggest that the salient network 
properties are indeed a product of evolutionary pressure. 
Therefore, these properties carry important biological 
information, and can be justifiably used to generate 
meaningful hypotheses for experimental research. 

We demonstrate that the proposed centrality index is 
one such network property that determines reactions 
important for viability of organisms. The method 
could therefore be used to identify candidate reactions 
for metabolic engineering and drug development. The 
results provide an impetus for addressing the long- 
standing doubts concerning the biological relevance of 
network properties. In addition, the proposed null 
model could be employed to verify the evolutionary 
assumptions in constraint-based approaches [52] and 
to provide an interface to synthetic biology studies. 

Finally, we envision that, similar to the proposed 
approach for metabolic networks, specifically designed 
null models will be developed for other physically con- 
strained systems, represented by gene- regulatory, 
protein -protein interaction and signalling networks. 
For instance, transcription factors depend on cis- 
elements and DNA-binding domains, which constrain 
the sequence of genes by which they are encoded. Like- 
wise, protein interactions and signalling interactions 
depend on functional domains and binding sites. Devel- 
opment of null models which integrate the governing 
physical constraints of such systems will likely stimulate 
novel insights into the structure- function relationship 
in complex biological networks. 



4. METHODS 

4-1' Genome-scale metabolic networks 

We conduct our analyses on the most widely used 
genome-scale metabolic networks of six model 
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organisms from all kingdoms of life: B. suhtilis [53], 
E. coli [26], S. cerevisiae [54], C. reinhardtii [55], 
A. thaliana [56] and H. sapiens [57]. The sizes of the 
networks vary according to the complexity of the 
represented organisms, ranging from 855 reactions 
and 766 compounds {B. suhtilis) to 2819 reactions 
and 2691 compounds {H. sapiens). Resulting from the 
bipartite graph reconstruction, detailed in §4.2, the 
number of vertices and edges varies accordingly, from 
1877 vertices and 5368 edges {B. suhtilis) to 7059 
vertices and 19651 edges {H. sapiens). The networks 
further differ in their quality regarding mass balance 
of reactions, availability of information on reversible 
reactions, and the number of (strongly) connected com- 
ponents (electronic supplementary material, table S3): 
only the network of E. coli is fully balanced and consists 
of one connected component. 

4-2. Mass-balanced randomization 

To estimate the evolutionary significance of network 
properties, we generated lO"^ mass-balanced randomized 
networks for each of the six analysed genome- 
scale metabolic networks. A metabolic network is rep- 
resented as a directed bipartite graph G = ( U 
E"), where is the set of compound vertices, the 
set of reaction vertices and E Q {V^x V^) \J {V^ x V^) 
is the set of directed edges denoting substrate -reaction 
and product -reaction relationships. For a compound c 
E we denote by E f^'^ its mass vector^ i.e. the 
vector representation of c over n chemical elements. 
Here, we consider only the six most abundant elements 
in biological systems [58]: carbon (C), hydrogen (H), 
nitrogen (N), oxygen (O), phosphorus (P) and sulphur 
(S). The mass vector of water is then (0,2,0,1,0,0) • 
(C,H,N,0,P,S)^. Reversible reactions are represented 
by one reaction vertex for each direction: and r~, 
such that Till = Tout and r+ut = ^m- 

In order to uniformly randomize a network while pre- 
serving mass balance, each possible mass-balanced 
network has to be generated with equal probability. 
This requires enumeration of all possible sets of sub- 
strates and products, for which equation (2.1) is 
satisfied. As this problem is a special case of the 
Knapsack problem [59], the number of possible mass- 
balanced networks is at least exponential in the 
number of compounds. 

We approach the complexity of the general problem 
by applying a new method for mass-balanced 
randomization, introduced in Easier et al. [25]. The 
set of possible solutions to equation (2.1) is restricted 
twofold: (i) the in- and out-degrees of reactions are 
preserved and (ii) the substitution of compounds is 
limited to certain subsets, as detailed later, which 
allows to easily find a solution for equation (2.1). 
The first restriction is in line with the observation 
that reaction degrees are biochemically constrained 
by the number of interacting compounds. The second 
allows to divide the randomization procedure into a 
precalculation step and an actual randomization. As 
a result, the generation of a large set of mass-balanced 
randomized networks becomes computationally 
feasible. 



The randomization procedure consists of two steps: 
In the first step, for a given metabolic network G, we 
determine the classes of mass equivalent compounds 
and pairs of compounds from Vc{G). Two compounds 
are called mass equivalent, if the mass vector of one 
compound is a multiple of the other (e.g. CO2 and 
C2O4). Two pairs of compounds are called mass equiv- 
alent, if the sum of mass vectors of one pair is a multiple 
of the sum of mass vectors of the other pair (e.g. 
(CH2O, CO2 ) and (C4H2O4, H2O2 )). This definition 
ensures that the mass vectors of two compounds 
(and the sums of the mass vectors of two pairs of 
compounds) from the same mass equivalence class 
differ only by rational factors (e.g. 2CH2O + 2CO2 = 
C4H2O4 + H2O2 ). The precalculation of mass equival- 
ent compounds is to be executed only once for all 
subsequent randomizations of the same network and 
renders the generation of large sets of mass-balanced 
randomized networks computationally feasible (see 
supplementary table SI in [25] for a performance 
comparison to switch randomization). 

In the second step, the reactions of G are randomized 
while preserving mass balance. To randomize a reaction 
chosen uniformly at random from ( G) , its substrates 
and products are replaced by randomly chosen substi- 
tutes from their corresponding mass equivalence 
classes. When substituting an individual substrate or 
product, the stoichiometric coefficients of the new reac- 
tion are obtained by multiplying the corresponding 
previous coefficients with the earlier mentioned factor, 
such that equation (2.1) is satisfied. For the substi- 
tution of a pair of substrates or products, the 
stoichiometric coefficients satisfying equation (2.1) are 
determined by solving a system of n linear equations 
with two unknowns (electronic supplementary material, 
table SI for examples). In case there is no solution, the 
substitution is not carried out. The output from this 
step is an (almost) uniformly randomized network in 
which stoichiometric coefficients are changed, edges 
are replaced and, consequently, the degrees of the 
compounds are altered [25]. The approach is in line 
with the observation that some fundamental properties 
should be fixed while carrying out the randomization — 
here, these are the degrees of the reaction vertices and 
mass balance. 

4-3. Calculation of p-values 

The analysed properties are calculated in the original 
metabolic network and in each of the 10^ randomized 
networks. For the average path length and cluster- 
ing coefficient, we derive a 2;-score, z={x—y)/a^ 
from the original value the average randomized 
value and the standard deviation of randomi- 
zed values a. The two-sided p-value is determined as 

p=2Ji:|A/-(0,l). 

For comparing the metabolite degree and scope 
size distributions of the metabolic networks with 
their randomized versions, we apply the two-sample 
Kolmogorov-Smirnov test. From the cumulative distri- 
bution Fn of the property in the original network, and 
the joint cumulative distribution F^' of the randomized 
networks, a test statistic is derived as dn^n' = sup^; 
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l^n(^) ~ F^{x)\^ where n and n! are the number of 
values in the original, respectively t he joint rand omized 
distributions. The p- value is p = ^J nn' / (n + n')dn^n'' 

For each reaction vertex r, we determine its central- 
ity, v(r), in the original network and in 100 randomized 
networks, which are obtained by preserving r and ran- 
domizing the remaining reactions. The p- value of r is 
Pr = (^^ +l)/(n' + 1), where is the number of ran- 
domized networks, in which the centrality of r is at 
least as large as in the original network, and = 100. 
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